{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z5HNN6aB4DKS"
   },
   "source": [
    "# DAGs: D-Separation and Conditonal Independencies, Adjustment via Backdoor and Swigs, Equivalence Classes, Falsifiability Tests.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BGpBoIOplvuj"
   },
   "outputs": [],
   "source": [
    "!pip install pgmpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WzEAUJ-N3zTR"
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.simplefilter('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FkByBeke4Gli"
   },
   "source": [
    "# Graph Generation and Plotting\n",
    "\n",
    "The following DAG is due to Judea Pearl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "pEuMutfK02cN"
   },
   "outputs": [],
   "source": [
    "import networkx as nx\n",
    "\n",
    "digraph = nx.DiGraph([('Z1', 'X1'),\n",
    "                      ('X1', 'D'),\n",
    "                      ('Z1', 'X2'),\n",
    "                      ('Z2', 'X3'),\n",
    "                      ('X3', 'Y'),\n",
    "                      ('Z2', 'X2'),\n",
    "                      ('X2', 'Y'),\n",
    "                      ('X2', 'D'),\n",
    "                      ('M', 'Y'),\n",
    "                      ('D', 'M')])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "DkCqH_N0pXH5"
   },
   "outputs": [],
   "source": [
    "from pgmpy.base.DAG import DAG\n",
    "\n",
    "G = DAG(digraph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Yc5wOsquyqCt"
   },
   "outputs": [],
   "source": [
    "import pylab as plt\n",
    "\n",
    "nx.draw_planar(G, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VXrVwLlp2B8x"
   },
   "outputs": [],
   "source": [
    "print(list(G.predecessors(\"X2\")))\n",
    "print(list(G.successors(\"X2\")))\n",
    "print(list(nx.ancestors(G, \"X2\")))\n",
    "print(list(nx.descendants(G, \"X2\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OT0PUzu31uwc"
   },
   "source": [
    "# Find Paths Between D and Y\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "21feRbRtQj-3"
   },
   "outputs": [],
   "source": [
    "list(nx.all_simple_paths(G.to_undirected(), \"D\", \"Y\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RuRQKNRX1zod"
   },
   "source": [
    "# List All Testable Implications of the Model\n",
    "\n",
    "Here we use D-separation to list all the conditional independence relations deduced from the DAG."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "pKtc2-Z411uB"
   },
   "outputs": [],
   "source": [
    "# these returns all conditional independencies even among two sets of variables\n",
    "# conditional on a third set\n",
    "dseps = G.get_independencies()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "W6npM1sB13yU"
   },
   "outputs": [],
   "source": [
    "# we display only the ones that correpond to pairs of singletons\n",
    "for dsep in dseps.get_assertions():\n",
    "    if len(dsep.get_assertion()[1]) == 1:\n",
    "        print(dsep)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1kdiPPXSGI3C"
   },
   "source": [
    "# Backdoor and frontdoor adjustments between D and Y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qH4IZynbBrLP"
   },
   "outputs": [],
   "source": [
    "from pgmpy.models.BayesianModel import BayesianNetwork\n",
    "from pgmpy.inference.CausalInference import CausalInference\n",
    "\n",
    "inference = CausalInference(BayesianNetwork(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eZwZnnUHF7__"
   },
   "outputs": [],
   "source": [
    "inference.get_all_backdoor_adjustment_sets('D', 'Y')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "tIwu7UapGUSD"
   },
   "outputs": [],
   "source": [
    "inference.get_all_frontdoor_adjustment_sets('D', 'Y')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "nf-Gu_7wGaME"
   },
   "outputs": [],
   "source": [
    "inference.get_minimal_adjustment_set('D', 'Y')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "D0lSm4M0Gg6d"
   },
   "outputs": [],
   "source": [
    "inference.is_valid_backdoor_adjustment_set('D', 'Y', 'X2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vpMg1hkF2Fee"
   },
   "source": [
    "# Testing DAG Validity by Checking Implied Conditional Independencies with DoWhy-GCM\n",
    "\n",
    "We found all the implied conditional independence relations above. Can we test them?  The answer is yes, and is particularly simple if the DAG is associated with a linear SEM.\n",
    "\n",
    "To illustrate, we simulate the data from a Linear SEM associated to DAG G, and perform a test of conditional independence restrictions, using a kernel based test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ukPS9xRXG2ZN"
   },
   "outputs": [],
   "source": [
    "!apt install libgraphviz-dev\n",
    "!pip install pygraphviz\n",
    "!pip install dowhy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WoxEhW_8iKVQ"
   },
   "outputs": [],
   "source": [
    "# generate data from the SCM\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "def gen_data(n):\n",
    "    Z1 = np.random.normal(0, 1, size=n)\n",
    "    Z2 = np.random.normal(0, 1, size=n)\n",
    "    X1 = Z1 + np.random.normal(0, 1, size=n)\n",
    "    X2 = Z1 + Z2 + np.random.normal(0, 1, size=n)\n",
    "    X3 = Z2 + np.random.normal(0, 1, size=n)\n",
    "    D = X1 + X2 + np.random.normal(0, 1, size=n)\n",
    "    M = D + np.random.normal(0, 1, size=n)\n",
    "    Y = M + X2 + X3 + np.random.normal(0, 1, size=n)\n",
    "    return pd.DataFrame({'Z1': Z1, 'Z2': Z2, 'X1': X1, 'X2': X2, 'X3': X3, 'D': D, 'M': M, 'Y': Y})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AcQcetSJnJIE"
   },
   "outputs": [],
   "source": [
    "data = gen_data(5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "y6pK_7AFng0u"
   },
   "outputs": [],
   "source": [
    "import dowhy.gcm as gcm\n",
    "from dowhy.gcm.independence_test import kernel_based\n",
    "\n",
    "causal_model = gcm.StructuralCausalModel(digraph)\n",
    "rej = gcm.refute_causal_structure(causal_model.graph, data, conditional_independence_test=kernel_based)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9wnneuDwHLCF"
   },
   "source": [
    "The rejection result contains two types of tests:\n",
    "- local markov tests: that test whether any node is independent of its non-descendants conditional on its parents (these also imply global markov conditions)\n",
    "- edge dependence tests: these test that the edges that were included in the graph carry sufficient amount of correlation.\n",
    "\n",
    "In our case, we will only be focusing on the local markov tests. We will reject the graph if any of the local markov tests fails, but we will ignore the other tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Tf581RGReSHM"
   },
   "outputs": [],
   "source": [
    "rej"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7Ynmp1uW3nw1"
   },
   "source": [
    "Next we replace $D$ by $\\bar D$ generated differently:\n",
    "$$\n",
    "\\bar D= (D + Y)/2\n",
    "$$\n",
    "Basically $\\bar D$ is an average of $D$ and $Y$ generated by $D$.  We then test if the resulting collection of random variables satisifes conditional indepdendence restrictions, exploiting linearity.  We end up rejecting these restrictions and therefore the validity of this model for the data generated in this way.  This makes sense, because the new data no longer obeys the previous DAG structure.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "dHXvxWJyq52H"
   },
   "outputs": [],
   "source": [
    "data['D'] = (data['D'] + data['Y']) / 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RFLYfcHQ3vWH"
   },
   "outputs": [],
   "source": [
    "rej = gcm.refute_causal_structure(causal_model.graph, data, conditional_independence_test=kernel_based)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5qv6du3t3vyX"
   },
   "outputs": [],
   "source": [
    "rej"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1VEi5vhKk3aM"
   },
   "source": [
    "# Identification with Front-Door and DoWhy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1I78DtXJP-2z"
   },
   "outputs": [],
   "source": [
    "data = gen_data(5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sQeyB38w3xgS"
   },
   "outputs": [],
   "source": [
    "import dowhy\n",
    "\n",
    "causal_graph = \"\"\"digraph {\n",
    "Z1; Z2; X1; X2; X3; D; M; Y;\n",
    "Z1 -> X1;\n",
    "X1 -> D;\n",
    "Z1 -> X2;\n",
    "Z2 -> X3;\n",
    "X3 -> Y;\n",
    "Z2 -> X2;\n",
    "X2 -> Y;\n",
    "X2 -> D;\n",
    "M -> Y;\n",
    "D -> M;\n",
    "}\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "hxdpclFAkybR"
   },
   "outputs": [],
   "source": [
    "cm = dowhy.CausalModel(data=data,\n",
    "                       treatment='D',\n",
    "                       outcome='Y',\n",
    "                       graph=causal_graph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "yAAzz51slT-z"
   },
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "\n",
    "cm.view_model(file_name='dag')\n",
    "Image.open('dag.png')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Kb0tdv2JPWBW"
   },
   "outputs": [],
   "source": [
    "# method seems not to be doing false discovery rate (multiple testing/joint inference) control\n",
    "# some tests might fail because of that when k is large (k=size of conditioning set)\n",
    "print(cm.refute_graph(k=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "6Ytk5z5KlVB5"
   },
   "outputs": [],
   "source": [
    "identified_estimand = cm.identify_effect(proceed_when_unidentifiable=True)\n",
    "print(identified_estimand)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rZJIMkgrsd1A"
   },
   "outputs": [],
   "source": [
    "estimate = cm.estimate_effect(identified_estimand, method_name='frontdoor.two_stage_regression')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MdpARgZSs5Ms"
   },
   "outputs": [],
   "source": [
    "print(estimate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NgGRQ5cas94t"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
